Sequence-based protein-protein interaction prediction via support vector machine

نویسندگان

  • Yong-Cui Wang
  • Jiguang Wang
  • Zhixia Yang
  • Nai-Yang Deng
چکیده

This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is scarce relative to large scale non-interacting protein pairs. The proposed methods are validated on PPIs data of Plasmodium falciparum and Escherichia coli, and yields the predictive accuracy of 93.8% and 95.3%, respectively. The functional annotation analysis and database search indicate that our novel predictions are worthy of future experimental validation. The new methods will be useful supplementary tools for the future proteomics studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Protein function prediction via graph kernels

MOTIVATION Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class ...

متن کامل

Identification of Surface Residues Involved in Protein-Protein Interaction – A Support Vector Machine Approach

We describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface) based on the identity of the target residue and its 10 sequence neighbors. Separate classifiers were trai...

متن کامل

Analyses for protein tertiary structure prediction by Mika Takata ( Under the Direction of

Protein fold classification is essential to recognition of protein tertiary structure. It is of particular interest to the structure analyses of proteins of low sequence identity with respect to proteins of known structures. We investigated the protein fold recognition problem with the Committee Support Vector Machine (CSVM) that proved efficient and effective in feature parameterization of bac...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

Domain Linker Region Knowledge Contributes to Protein-protein Interaction Prediction

Protein-protein interaction has proven to be a valuable piece of biological knowledge and a starting point for understanding the internal workings of the cell. In this paper, we propose a novel method for protein-protein interaction prediction using only the primary structural information of the protein sequence. The method is developed based on inter-domain linker region knowledge and a combin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Systems Science & Complexity

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2010